frobenius norm
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Industry:
- Information Technology (0.67)
- Health & Medicine (0.45)
Technology:
Country:
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (4 more...)
6 Appendix
We observe that for the self-attention layers, the correlation of weights for the same head is stronger. Additionally, the best grouping might depend on the type of the layer (e.g., key, query, value, or To simplify the implementation, we treat all the different kernels in the self-attention as a type of fully-connected layer. We down-sample along each dimension to make the computation feasible. To relate with the Frobenius norm, we compute the square of each element and normalize the value. In Figure 5, we show the approximation error comparison for different approximation methods.
Technology: